Caio Raphael

Byte

Is the smallest addressable unit of memory on a system.

Size

A byte is not always the same as a u8 (unsigned 8-bit integer), although they are often treated that way in modern systems.
- On almost all modern hardware:
  - 1 byte = 8 bits
- So a byte happens to match a u8
- That’s why in practice people often treat them as equivalent.
Its size is defined by the architecture, not the language.
6-bit byte systems :
- IBM 1401
- CDC 6600
- Reason:
  - Character sets (like early encodings) fit in 6 bits (64 symbols)
  - Optimized for text and business data
9-bit byte systems :
- DEC PDP-10
- Reason:
  - Used 36-bit words, often split into '4 × 9-bit bytes'.
  - 8 bits for data + 1 parity/error bit.
Virtually all general-purpose CPUs use 8-bit bytes.
This is standardized in practice because:
- ASCII → extended to 8-bit
- Hardware, networking, storage all aligned around 8 bits

C/C++ standard

sizeof(char) == 1 → this is a byte
But a byte is only guaranteed to be at least 8 bits, not exactly 8.
Theoretically:
- 1 byte could be 16 bits
- Then:
  - char = 1 byte = 16 bits
  - u8 = 8 bits → not the same thing
uint8_t exists only if the platform actually supports an 8-bit type
char ≠ guaranteed 8 bits

Rust

u8 is always 8 bits
u8 is effectively the language’s “byte”
Rust assumes 8-bit bytes for supported platforms

word, dword, qword

A word is the natural data size of a CPU—the size it processes most efficiently.
Historically: tied directly to register width.
Today: still loosely tied to architecture, but terminology is often legacy-driven.
The CPU/ISA defines them. They are not defined by the OS.
word = 16 bits (original 16-bit CPUs like the 8086)
dword = “double word” = 32 bits
qword = “quad word” = 64 bits
Example x86 assembly:
```
mov eax, dword ptr [rbx]
```

Name confusion

These meanings are not universal, just widely adopted due to x86.
Architectural definition
- A word = register size
- 16-bit CPU → 16-bit word
- 32-bit CPU → 32-bit word
- 64-bit CPU → 64-bit word
x86 legacy usage :
- word = 16 bits (even on 64-bit CPUs)
- dword = 32 bits
- qword = 64 bits
So on modern x86, a “word” is not the natural CPU size anymore
Low-level languages usually avoid ambiguity:
- C/C++:
  - Avoid “word” entirely
  - Use uint32_t , uint64_t , etc.
- Odin / Rust:
  - Explicit sizes (u8 , u16 , u32 , u64 )
  - No reliance on “word” terminology

Register Efficiency

"using a byte (u8) for a 64bit system is not as efficient as using a u64?"
Using u8 is not inherently inefficient on a 64-bit system, but there are cases where u64 is faster.
A 64-bit CPU (e.g., x86-64, ARM64) is optimized for 64-bit registers, so:
- Operations on u64:
  - Usually map directly to single instructions
  - Fully utilize registers
- Operations on u8:
  - Often get promoted to 32 or 64 bits internally
  - May involve extra masking or extension instructions
So for pure arithmetic:
- u64 → often more efficient
- u8 → sometimes slightly less efficient
This is where u8 can actually be more efficient :
- u8 uses 8× less memory than u64
- Smaller data:
  - Better cache utilization
  - Fewer cache misses
  - Higher bandwidth efficiency
- Example:
  - Processing a large array:
  - u8[] → more data fits in cache → often faster overall
  - u64[] → fewer elements per cache line
- Modern CPUs use SIMD heavily:
  - With u8:
    - You can process 16–64 elements at once (e.g., AVX2/AVX-512)
  - With u64:
    - Only 2–8 elements at once

CPU Architectures

| Term  | Architecture   | Bits   | Notes                  |
| ----- | -------------- | ------ | ---------------------- |
| x86   | Intel (legacy) | 32-bit | Original PC standard   |
| x64   | x86-64         | 64-bit | Extension of x86       |
| amd64 | x86-64         | 64-bit | Same as x64 (AMD name) |
| arm64 | ARM (AArch64)  | 64-bit | Different ISA entirely |

x86 (32-bit)

Standardized as 32-bit
Key properties:
- 32-bit registers (EAX, EBX, etc.)
- 32-bit address space (~4 GB limit)
- Complex instruction set (CISC)
When someone says:
- “x86 build” → usually means 32-bit binary
Why is it called x86 ?
- Refers to the classic Intel architecture starting from:
  - 8086 → 80286 → 80386 → 80486 → Pentium...
- Instead of listing all of them, people started referring to the whole family as:
- “x86” = any processor in the *86 family
- The “x” is just a wildcard.

x64 / amd64 (64-bit x86)

These are the same thing.
AMD created the 64-bit extension to x86
Called it amd64
Intel adopted it (called it Intel 64)
So:
x64 = amd64 = x86-64
Key properties:
- 64-bit registers (RAX, RBX, etc.)
- Much larger address space
- Backward compatible with 32-bit x86

ARM64 (AArch64)

Completely different architecture from x86.
Designed by ARM Holdings
Used in:
- Phones
- Tablets
- Apple Silicon (M1/M2/M3)
- Many servers now
Key properties:
- 64-bit only (in ARM64 mode)
- Simpler instruction set (RISC)
- Different registers and instructions from x86

What the CPU provides vs Typing

u8 , u16 , u32 , u64
i8 , i16 , i32 , i64
int , uint
bool
pointers
struct
etc
They are ways for a programming language to describe how many bits to use and how to interpret them.
The CPU itself only sees bits and instructions.
The hardware supports multiple widths—but it does not define “types”
A CPU (ISA) defines:
- Register sizes (e.g., 64-bit registers on x64)
- Instruction widths (8, 16, 32, 64-bit operations)
- Operations: add, sub, mul, load/store, etc.
Example (x86-64 idea):
- add rax, rbx → 64-bit add
- add eax, ebx → 32-bit add
- add al, bl → 8-bit add
The CPU doesn’t inherently “know” signed vs unsigned
- Difference comes from which instructions you use:
  - Unsigned division → div
  - Signed division → idiv
- Signed comparisons vs unsigned comparisons → different opcodes
- Signedness is a semantic layer imposed by the compiler, not a stored property.
For pointers, the CPU just sees a number:
- 32-bit → pointer = 32-bit integer
- 64-bit → pointer = 64-bit integer
- There's no special “pointer hardware type”.
A struct is a group of fields with layout rules; a memory pattern. The CPU just sees contiguous memory.